DDCUHERT RESUME 



iR Oil 129 

Gevarter, William B, 

An Overview of Computer-Based Natural Language 
Processing. 

National Aeronautics and Space Administration > 

Washington i^ D.C. 

NASA-TM-85635 

83 

52p. ; Best copy availabli • 
Information Analyses (070) 

MF01/PC03 Plus Postage. 

*Artif icial Intelligence ; *Cdmputat ibnal Lingiii sties; 
*eomputers; Computer Software; Grammar ; Information 
Processing; *Language Processing; *Man ^fachine 
Systems;^ Program Descriptions; Research Projects; 
Semantics 

Database Management Systems; *Natural Language 
Processing; Parsing 



-Computer-based Natural Language Processing (NLP) is 
the key tb^ enabling humans and their computer-baised creations to 
interact with machines using natural languages (English, Japanese, 
German, etc.) rather than formal computer languages. NLP is a major 
research area in the fields of artificial ihtelligehce and 
computational linguistics . Commercial natural language interfaces to 
computers have recently entered the inarket and the future looks 
bright for other applications as well. This report presents an 
overview of: (1) NLP applications; (2) the three basic NLP 
approaches; (3 ) various types of grammars; (4) difficulties in 
semantic processing of phrases and sentences; (5) knowledge 
representation (KR); J^6} types of syntactic parsing; (7) the 
relationship amogg semantics , parsing, and understanding; (8) NLP 
system types, including question answering^ natural language 
interface (NLI ) , computer aided instruction (CAl|, discourse , text 
-uhd&rstahdihg, -ahd-text-gea^ tlie function s, "_' 

apprbachtBS, capabilities, and limitations of NLP systems developed 
fbr research purpbses; 110) current research NLP systems; (11) the 

apprbximate price, developer, purpose, and features of some 

cpminercial NLP systems; {12) the state of the art of NLP systems; 
( 13 ) prbblems arid i ssues related _tb language use , linguistics , 
cdnversatibn , prbcessbr desigri , database interfaces, and text 
understanding; (14) research required in these areas; (15) the 
principal United States arid f breigri _part iciparits in NLP, and the 
priricipal U.S. gbverriitierit agencies fundi rig NLP research^ (16) future 
trends and expectat ions ; arid (17) further. sburces of irifbrmatibri . A 
39-item bibliography arid a glbssary are alsb provided. 
(Author/ESR) 



ED 244 607 

AUTHOR 
TITLE 

INSTITUTION 

REPORT NO 
PUB DATE 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



* Reproductions supplied by EDRS are the best that car be made * 

* __ f rom the or iginal docum^ * 

****************** icicieicieieieieicicicicieic ****** ieic^ 



EKLC 



O N83-24i93 

rg 

CD 

1^- ^ An Overview of eomputer-Based Natural language Processing 



National Aeronautics and Space Administration 
Washington, D.e. 



1983 



0:S: DEPARtMENt OF EDUCATION 

NATiONAt INSTITUTE OF EOUCAtjON 

EDa^ATlONAl.BFSOlJRCES INFORM ATION 

_y/ CENTER lERlCl 

V This document has been feprnduced as 

f>.'C(;ivtf(1 tfom thf; pfffson or 6fgani>ati6n 

priyiriijtintj il 

Minor changtfs have lumn m.icJf; tn improve 
reprodnetifiri onjIitv 

• Points [>t view or npinions btdted iri iiiis Socu 
m»!nt tJo not ni.'Cfrsb.inlv represent officidl NIE 
pnsidon (tf ()i]|ii:v 




NASA TeGhnieal Memefattdum 85635 



N83-24i93 



4 



H 

o 

ERIC 



An Overview of Gomputer-Based 
Natural Language Processing 



Williarn B. Gevartcr 

eFfieE 0F AERONAUTICS AND SPACE tECHNOLOGY 
NA SA Headquarters 
W^'ashmgion, D. C. 20546 



Naiional Aeronautics - : 
and Space Adnriinistration 

* Sclentmc^shd Technical NAlfoNAi TECH^CAU 

Information Branch INFORMAIION _S_E_R VICE 



us. OEPmMEN^T Of C&MMERCE 
^gg2 SPRINGFIEU). VA. 22161 



t; Report No. 

_UASA TM-8 56 35 



2. Goverhmiht Acceuion No. 



4 T*tJe $r\d Subuiit 

A.\'_QV£HVIgW OF COMPUTER-BASED NATUPMi LANGUAGE 
PROCESSING 



3. Recipitnt i Cauidg rJc; 



7 A.jtnjf(t) 



William E. Gevarter 



9 H^fOffTung Orgdrit^ation Kldme and AdCrru 

Narional Aeronautics . and Space Administration 
Washington, D.C. 20546 



April 1983 



6. -Performing drginiialion Code 

RTC-6 



8. Performing Organirauon Report No; 



10; Wbfk Unit Nb. 



11; Contract or Grant No. 



\2 S^jr»>oft;ig A.^ncv Nart.e and Addreu 

National Aeronautics and Space Adiniriistratibh 
Washington, D.C. 20546 



13. Type of Repoa and Periba Covered 

Technical Memorandum 



14. Sponsoring Agency Code 

RTe-6 



io. Abstract 



^tcNbascd Natural Language Processing (NLP) is the key to enabUng humans and their 

computer-based creations to interact with machines in natural language (iike English, Japanese, 
German, etc. :a contrast lojbrmal computer languages). The doors that such ah achievement can 
open have r::adc thL a major research area in krtifidal Intelligence and Gomputadonal 
Linguistics, eoinmercial natural language Jnterfacfe to computers have recently entered the 
m^kct and the future looks bright for other appUcations as well. 

This rcpon reviews the basic approaches to such systems, the techniques utHixed, applications, 
the state-of-the-art of the techhorogy, issue and r«carch requirements, the major participants, 
and fmaily, future-trends and expectations; 

It is anticipated that this report will prove useful th cngin^ng and research managers, poten- 
tial users, and others who will be affected by this field as it unfolds. 



t7. Key Woras (Suogeiied by Authorti)) 

Ar tif icial intelligence > Computation, 
I Linguistics , Natural Language^ 
•Machine Translation, Data Bases, 
1 Cognitive Science, Man-Machine Inter- 

face,Query Systisms, Computers 



£8. Oiitribution Statement 

Unclassified - Unlimited 



Subject Category 6l 



19 Security Oauit. (bMhli report) 

Unclassified 



20. S«curity_Classif: (ot thjf page} 

Unclassified 



2t. No. or Paget 

5^ 



22. Price 

A04 



Fbf sale by Ihe National Technical Infoinialidn Service. Spfinelield. Virginia 22161 



NASA-Langley, 1983 



ERIC 



AN OVERVIEW OF COMPUTER-BASED 
NATURAL LANGUAGE PROCESSING* 



PREFACE 

Ccmpuicr-based Natural Language Processing (NLP) is the key to enabling humans and their 
compute: teased creations to interact with machines in natural language (like English, Japanese, 
O-.Miiad, cic: in contrast to formal computer languages). The doors that such an achievement can 
open lia^e made this a major research area in Artificial Intelligence and Computational 
Linguistics, eommercial natural language interfaces to computers have recently entered the 
inarkei and the future looks bright for other af plications as well. 

f nis repon reviews the basic approaches to such systems, the techniques utilized, applications, 
;he state-of-the-ari of the technology, issues and research requirements, the major participants, 
and !:nally. future trends and expectations. 

it is aiuicipated that this report will prove useful to engineering and research managers, poten- 
tial users, and others who will be affected by this field as it unfolds. 



Sirs repcri is part of the NBS/NASA series of overv^iew repons on Artificial Intelligence and Robotics: 

Preceding page blank : 

5 



ACKNOWLEDGEMENTS 



I wish to thank the riiany people and organizations who have contributed to this report, both in 
providing inforrriatidh, and in reviewing the repldrt arid suggestirig cdrrectidris, riiddificatidris arid 
additions. 1 particurarly would like to thank Dr. Brian Philiips of Texas instruments, Dr: Raiph 
Grishman of the NRL AI Lab, Dr. Barbara Groz of SRI Iriterriatidrial, Dr. Dara Daririeriberg of 
Cognitive Systems, Dr; Gary Hendrix of Symantec, and Ms. Vicki Roy of NBS for their thoroagh 
review df thi5 report and their rriariy helpful suggestions. However, the responsibility of any re- 
maining errors or inaccuracies most remain with the author. 

It is riot the iriterit df the National Bureau df Standards or NASA to recommend or endorse any 
of the systems, manufacturers, or organizations mentioned in this report, but simply to attempt 
to provide an overview of the NLP field. However, in a growing field such as NLP, important 
activities and products may not have been mentioned. Lack of such mention does not in any way 
imply that they are not also worthwhile. The author would appreciate having any such omissions 
or oversights called to his attentiori sd that they can be cdrisidered for future repdrts. 

\ 



TV 



TABLE OF CONTENTS 



Page 

A. introduction : ; ; ; : j 

B. Appiicatibhs ..... ? 

C. Approach - . ^ 

1 . Typie A: No World Models % 

a, key Words or Patterns 3 

b. Limited Logic Systems 3 

2. Type B: Systems That Use Explicit World Models 3 

3. Type C: Systems That Include Information About the 

Goals and Beliefs in Intelligent Entities 3 

i>: The Parsing Probieiri 3 

E. Grarhihar 4 

1. Phrase-Structure Grammar— Context Free Grammar. 4 

2. Transformational Grammar 4 

3: Case Grammar 4 

4. Semantic Grammars , 5 

5. Other Grammars _ _ _ _ _ ^ _ ^ ^ _ _ ^ ^ ^ _ . ^ ^ 

F. Seinahtics and the Cantankerous Aspects of Language 5 

1 - Maltjpie Word Senses 5 

2. Modifier Attachment .^^ 5 

3. Noun-Noun Modification V _ ^ ^ 

4. Pronouns \. . , 6 

5. Ellipsis and Sabstitation \, 6 

6. Other Difficulties .^^ . ^ . 6 

C Knowledge Representation , . . i> § 

1 : Procedural Represehtatibhs : . . ; . , . . 6 

2. Declarative Representations g 

a: Logic. _ ^ ^ _ ^ _ ^ _ ^ 

b. Semantic Networks 5 *• 

3; Case Frames _ ^ 

4. Cbhceptual Dependency 6 

5 . Frame j 

6. Scripts ; ' 

H, Syntactic Parsing j 

1 : Template Matching 7 

2. Transition Nets 7 

3; Other Parsers 8 



V 

o 

ERIC 



TABLE OF eONTENtS (continued) 

Page 

1: Semantics, Parsing and IJnderstandihg 10 

J. NLP Systems H 

1: Kinds 

a. Question Ailswering Systems : : : H 

b. Natural Language Interfaces (NLI's) ; . 1 1 

c. Computer-Aided Instruction (6Ai) H 

d. Discourse 1 1 

e. Te.xt Understanding • H 

f. Text Generation i2 

g. System Building Tools: 1? 

2. Research NLP Systems 12 

a: Interfaces to Computer Programs 12 

b. Natural Language Interfaces to Large Data Bases. 13 

c: Text Understanding 13 

d. Text Generation 13 

e: Machine Translation 13 

f. Current Research NLP Systems 13 

3: Commercial Systerr.s .13 

k. State of the Art . . 31 

L. Problems and Issues 31 

1 . How People Use Language 31 

2. Linguistics :31 

3. Conversation 31 

4. Processor Design 32 

5: Data Base interfaces 33 

6. Text Understanding 33 

M. Research Required • 34 

i ; How People Use Language 34 

2. Linguistics 34 

3- Conversation :34 

4. Processor Design 34 

5. Data Base Interfaces 35 

6. Text Understanding 35 

S . Principal U.S. Participants in NLP -35 

1 : Research and Development • 35 

2. Principal U.S. Government Ageijcies Funding NLP Research 36 

3: Cornrnercial NLP Systems r 36 

4. Non-U. S 37 



8 



TABLE OF eONtENTS (continued) 

Page 

n. Forecast 37 

P: Further Sources of infdrmatibri 38 

1. Journals 38 

2 Conferences 3g 

3 . Recent Books ^ 39 

4. Overviews and Surveys 39 

Kt ferehces 41 

Ciiossar> 4^ 



tiST OF TABLES 

■ Page 

1: Sonne Applications of Natural Language Processing 2 

II. Natural Language Understanding Systems 14 
a: SHRDLU 14 

b. SOPHIE 15 

c: TDUS^:::::::.:::.. 16 

d. LUNAR ::lf 

c. PLANES/JETS, 18 

f. ROBOT /INTELLECT .19 

g. LADDER 20 

h. SAM .21 

i. PAN! 22 

HI. Current Research NLP Systems 23 

IV. Some Commercial Natural Language Systems .29 

LIST OF FIGURES 

Page 

t . A Transition Network for a SmaM Subset of English 8 

2; Example Noun Phrase Decdmpositiph 9 

3. Parse Tree Rerresentation of the Noon Phrase Surface Structure 9 



9 



NATURAL LANi;t AGE PROCESSING 

A; IntnHluction 

One major goal of Artificial Jnteliigence (AI) research has been to develop the means to inter- 
Hc: with machines in natural language (in co:i[rast to a computer language). The iriteraciidii mav 
be typed, printed or spoken. The complementary goal has been to understand how humans com- 
municate: The scientific endeavor aimed at achieving these goals has been referred to as cdmputa- 
tional linguisiics*. an effort at the intersection of AI, linguistics, philosophy and psychology. 

Human communication in.riatural language is an activity of the whole intellect. AI reseaichers. 
in trying to formalize what is reqiiired to properly address natural language, find themselves in- 
\oiNed in the long term endeavor of having .10 come to grips with this whole activity. (Formal 
linguists tend to restrict themselves 10 the structure of language.) The current AI approach is to 
c.nc-cptualize language as a knowledge-based system for processing communications and to 
create computer programs to model that process. 

A communication act can serve many purposes^ depending on the goals, intentions, and 
strategies of the cbmmunicatbr. One goal of a communication is to change some aspect of the 
recipieni's mental state. Thus, communication endeavors to add or modify knowledge, change a 
mood, elicit a response, or establish a new goal for the recipients: 

For a computer program to interpret a relatively unrestricted natural language communication, 
a great deal of knowledge is required.- Knowledge" is needed of: 

— ihe structure of sentences 

— lh^? meaning of words 

— the murpholbgy of words 

—a model of the beliefs of the sender 

—the rules of conversation, and 

—an extensive shared body of general information about the world. 

This body of knowledge can enable a computer (like a human) to use expectation-driven proc- 
essing in which knowledge about the usuaj properties of known objects, concepts, and what 
tvpically happens in situations, can be used to understand incomplete or ungrammatical sentences 
in appropriate contexts. 

Thus, Barrow (1979, p. 12) observes: 

in current attempts to han natural language, the need to use knowledge about the subject matter of the 
convcrs;^iion. and not just grammatical niceties, is rccbgriizcd^^it is now believed that reliable translation is 
no: possible w;jthoui. such knowledge. It is esseniial to find the best interpretation of what is uttered th: t is 
consistent wuh all sources of knowledge-lcxIcaJ. grammatical, semantic (meaning), topical, and contexiaal 



Or more broadly, as Cognitive Science. 



1 



10 



Arcieh (1980, p. 463) adds: 

In writing a program for understanding languages, one is faced with all the P rob [ems of it^:^"fi<^'iiii.»nte!ligcnce^ 
prbbleriis of coping with Huge amburits of knowledge; of finding ways to represent and d^'scribe curnpie.x 
cognirive structures, as weii as finding an appropriate structure in a gigantic space of pdssibiliiief;. Much of the 
research in ijriderstandirig rijatiiral languages is aimed at these problems. 



As indicated earlier^ natural language communication between humans is very dependent upon 
shared kridvv!edge» models of the world, models of the individuals they are communicating with, 
and the purposes or goals of the comiriuhicatidri. Because the listener has certain expectations 
based oh the context and his (or her) models, it is often the case that only minimal cues are needed 
in the comSunication to activate these models and determine the meaning. 

The next section, B, briefly outlines applications for natural language processing (NtP) 
systems: Sections C to 1 review the technology involved in coristructirig such systems, with 
existing NLP systems being summarized in Section J. 

The state of the art, problems and issues, research requirements arid the principle participarits 
in NLP are covered in Sections K through N. Section O provides a forecast of future 
developments. 

A glossary of terms in NLP is provided at the back of this report. Further sources of informa- 
tion are listed in Section P. 



B. Applications 

There are many applications for computer-based natural language iiriderstaridirig systeriis. 
Soriie of these are listed in Table I. 



Discourse 
Speech Understaridirig 
Story Understanding 



TABLE I. Some Appiivations ofNniurai tanguage Processing. 

interactioh with InteUigeht Programs 
Expert Systems Interfaces 
Decision Support Systems 
Explanation Modules For Computer Actions 
Interactive Interfaces to Coriiputer Programs 



jnformation Access 
Irifdrriiatiori Retrieval 
Question .Answering Systems 
Computer-Alded Instruction 

Information AcqvAsition or Transfbrmatioh 
Machirie Trarislatibri 
Document or Text Understanding 
Autbriiatic Paraphrasing 
Knowledge Compilation 
Knowledge Acquisition 



Tnteracting with Machines 
Control of Complex Machines 

Language tjeneratton 
Document or Text Generation 
Speech Output 

Writing Aids: e.g., grammar checkin§ 



11 



C. Approach 

(NLP) systems utiiize both linguistic knowledge and domain 



•in,; MDcis 1 -i- ■-"'^"S'^- v^" oasis. Menanx ana sacei-- 

^ou ( 1 9M) classify systems as Types A. B or C^ with Type A being the simplest, least capable 
and corre:>pond.'ngly least costly systems. 

■ Type A: No World Models 
Q. Key Words or Patterr^ 

The simplest systems utilize ad hoc daia structures to store facts about a limited domain Input 
sentences are scanned by the programs for predeclared key words, or patterns, that indicate 
known objects or relationships. Using this approach, early simple tempiate:based systems, while 
.gnonng the complexities of language, sometimes Were able to achieve Impressive results Usually 
heuristic empirical rules were used to guide the interpretations. 

h. timUed Logic Systems 

In limitcd logic systems, information in their data base was stored in some formal notation and 
langaage mechanisms were utilized to translate the input into the internal form. The internal form 
Chosen was such as to facilitate performing logical inferences on information in the data base. 

2. - Type B: Systems That UseExpttcii Wortd Models 

In these systems, knowledge about the domain explicitly encoded, usually in frame or net- ' 
work representations (discussed in a later section} that allow the system to understand input in 
terms of context and expectations. Cullinford'.^ work (Schank and Ableson. 1977) on SAM 
(Script Appher Mechanism) is a good example of this approach. 

S. Type C: Systems tMt Include {njormution uboi^t the Goals and Belief oflntdligew Entities 
These advanced systems (still in the research stage) attempt to include in their knowledge base 
information about the beliefs and intentions , t the participants in the communication: if the goal 
of the communication is known, it is much easier to interpret the message: Schank and Abelson's 
(1J7,) vork otj plans arid themes reflects this approach. 

I). The Parsing Problem 

For rno^e complex systems than thc.^e based on key words and pattern matching, language ' 
knowledge is required to interpret the sentences. The system usually begins by "parsing" the in- 
£ut (processing an input sentence to produce a more useful representation for further analysis) 
This representation is normally a structural description of the sentence indicating the relation- 
ships of the component parts. To address the parsing problem and to interpret the result, the 

•O.hcr syMcin classifications arc possible, e.g.: those ba..ed on ihe range of syntactic coverage. 



3 

o 

ERIC 



cbhipuiatioiiai linguistic commnnity has studied syntax, semantics, and pragmatics. _Syntax is the 
study of the structure of phrases and sentences. Semahtici is the study of meaning. Pragmatics Is 
the study of the use of language in context. 

E. Grammar 

Barr and FeigenbaUm (1981, p. 229). state. "A grammar of a language is a scheme for specify- 
ing the sentences allowed in the language, indicating the syntactic rules for combining words into 
well-formed phrases and clauses." The following "grammars are some of the most important.' 

1. Phrase Siruciure Gram^^^ 

Chomsky (see, for example. Winograd. 1983) had a major impact on linguistic research by 
devising a mathematical approach to language: Chomsky defined a series of gfaffimars based on 
rules for rewriting sentences. Into their component parts. He designated these as, 0, 1, 2, or 3, ; 
based on the restrictidris associated with the rewrite rules, with 3 being the most restrictive. 

Type 2— eonte.xt-Free (GF) or Phrase Structure Grammar (PSG)— has been one of the most 
useful in naturaPlangUage processing, it has the advantage that all sentence structure derivations 
can be represented as a tree and practical parsing algorithms exist. Though it is a relatively natural 
grammar, it is Unable to capture all of the sentence constructions^ found in most natural languages 
such as English. Sazder (1981) has recently broadened the applicabiUty of CF PSG by adding 
augmentations to handle situations that do not fit the basic grammar. This generalized Phrase 
Structure Grammar is now being developed by Hewlett Packard (Gawron et al., 1982). 

2. Transformational Grammar 

Tennant (1981, p89) observes that "The goal of a language analysis program is recognizing 
grammatical sentences arid representing them m a canonical structure (the underlylrig structure)." 
A transformational grammar (Chomsky, 1957) cdrisists of a dictionary, a phrase structure gram- 
mar and a set of trarisformatlons. In analyzing sentences, using a phrase structure grammar, first 
a parse tree is produced. This is called the surface structure. The transformational rules are then 
applied to the parse tree to transform it into a canonical form called the deep (or underlying) 
structure. As the same thing can be stated Iri several different ways, there may be many surface 
structures that translate into a single deep structure. 

3. Case Crammar 

Case Grammar is a form of Transformational Grammar in which the deep structure is based on 
cases— semantically relevant syntactic relationships. The central idea is that the deep structure of 
a simple sentence consists of a verb and one or more noun phrases associated with the verb in a 
particular relationship. 1 nese semaritically relevarit relationships are called cases. Fillmore (1971) 
proposed the following cases: Agent, Experlencer, Instrument, Object, Source, Goal, Location, 
Type and Path. 



'Charniak and Wilks (1976) provide a good overview of the various approaches; 



The cases for each verb form ah ordered set referred to as a *'case frame; A case frame for the 
verb **dpeh'' would be: 

(object (ihiiirument) (agent)) 

which indicates that open always has an object, but the instrument or agent can be omitted as in- 
dicated by [heir surrdundirig parentheses. Thus the case frame associated with the verb provides a 
template which aids in understanding a sentence: 

4. Semaruic Grammars 

In limiied domains^to achieve practical sysiems,it is often useful, instead of using cdriveritibrial 
syntactic constituents such as noun phrases, verb phrases and prepositions, to use meaningful 
semantic components instead. Thus, in place of nouns when dealing with a naval data base, one 
might use ships, captains, ports and cargos. This approach gives direct access to the semantics of 
a sentence and substantially simplifies and shortens the processing. Grammars based on this ap- 
proach are referred to as semantic grammars (see, e.g., Burtdri, 1976). 

5. Other Grammars 

A variety of other, but less prominent, grammars have been devised. Still others can be ex- 
pected td be devised in the future. One example is Montague Grammar (Dowty et ah, 198i) which 
uses a logical functional representation for the grammar and therefore is well suited for the 
parallel -processing logical approach now being pursued by the Japanese (see Nishida and 
Doshita. 1982) for their future Ai work as embodied in their Fifth Generation Computer research 
project. 

F. Semantics and the Cantankerous Aspects of Language 

Semantic processing, as it tries to interpret phrases and sentences, attaches meanings to the 
words, tinfortunateiy, jEnglish does not make this as simple as looking up the word in the dic- 
tionary, but provides many difficulties which require context and other knowledge to resolve. 

/. MulnpJe Word Senses 

Syntactic analysis carl resolve whether a word is used as a noun or a verb, but further analysis is 
required to select the sense (meaning) of the noun.or verb'that Is actually used. For example, 
"fly** used as a noun may be a winged insect, a fancy fishhook, a baseball hit high in the air, or 
several other interpretations as well. The appropriate sense can be determined by context (e.g., 
for **ny*' the appropriate domain of interest could be extermination, fishing, or sports), or by 
matching each noun sense with the senses of other words in the sentence. This latter approach was 
taken by Reiger and Small (1979) using the (still embriohic) technique of * 'interacting word ex- 
perts**, and by Fihih (1980) and McDonald (1982) as the basis for understanding noun com- 
pounds. , 

2. Modifier A ttachment 

Where to attach a prepositional phrase to the parse tree cannot be determined by syntax alone 
but requires semantic knowledge. Put the plant in the box on the table, is an example illustrating 
the difficulties that can be encountered with prepositional phrases. 



J. Noiin-Noun Siodification 

Choosing the apprbpfiate felatidriship whieri drie ridun mddifies aridthef depends on semantics. 
For example, for •*apple vendor", one's knowledge tends to force the interpretation **vend6r of 
apple:;" father than "an apple that is a veriddr," 

4. Pr o n o u ns 

Pronouns allow a simplified reference to previously used (or implied) nouns, sets or events. 
Where feasible, pronoun antecedents are usually identified by reference to the mdst recent noun 
phrase having the same pragmatic context as the pronoun. 

5. Ellipsis and Substitution 

Ellipsis is the phenomenon of not stating explicitly some words in a sentence, but leaving it to 
the reader or listener to fill them in. Substitutidh is similar— iasirig a dummy word in place of the 
omitted words; Employing pragmatics, ellipses and substitutions are usually resolved by matching 
the incdmplete statement td the structures df previdus recent sentences— finding the best partial 
match and then filling in the rest from this matching previous structure: 

6. Other Difficulties 

In addition td those jiist mentidned, there are dther difficulties, such as ariaphdric references, 
ambiguous noun groups, adjectivals, and incorrect language usage. 

G. Knowledge Represehtatibh* 

As the AI approach to natural bnguage processing is heavily knowledge-based, it is not surpris- 
ing that a variety of knowledge representation- (KR) techniques have found their way into the 
field: Some of the more important ones are: 

). Procedurai Representations — The meanings of words or sentences being expressed as com- 
puter prdgrams that reasdri abdut their meariirig. 

2. Dectarative Representations 

a. Logic — Representation in First Order Predicate Logic, for example. 
b\ Semantic Networks — Representations of concepts and relationships between concepts as 
graph structures consisting of nodes arid labeled conriectirig arcs. 

3. Case Frames— (coveTtd earlier) 

Conceptual Dependency— This apprdach (related to case frames) is an attempt td provide a 
representation of all actions in terms of a small number of semantic primitives into which input 



•More complete presentations on KR can be found in Chapter III of Bafr and Feigeribaum (1981), and in Gevaftef 
(1983). 



senienees are rnapped (see. e.g.. SchaSk and Riesbeck. 1981). The system relies on 11 primitive 
P"-''^'":^'- 'nstrumental and mental ACT's (propel, grasp, speak; attend; P trans. A trans, etc.). 
pioi several other categories or concept types; 

5. Fmme-.\ comp\tx data structure for representing a whole situation, complex object or series 
01 e\enis. A frame has slpts for objects and relalibris appropriate to the situation; • 

■^'^''''^"''^^^^^ structures for repreSeniing stercdtyped sequences of ^ehis to aid in 

uhdersianding simple stories. 

H. Syntactic Parsing 

Parsing assigns structures to sentences; The following types have been developed over the years 
!cr NLP (Barr and Feigenbaum. 1981). 

L Tetvplate Matching, Most of the early, and some current. NL programs perform parsing by 
matching their input seritehces against a scries of stored templates. 

2: TransUion Nets 

Phrase structure grammars can be syntactically decomposed using a set of rewrite rales such as 
indicated in Figure 1. Observe that a simple sentence can be rewritten as a Noun Phrase and a 
Verb Phrase as indicated by: 



S ^ NP VP 

Tile noun phrase Can be rewritten by the rule 

NP -^(DET)(ADJ*)N(PP*) 

where the parentheses indicate that the item is optional, while the asterisk indicates that any 
number of the items may occur. The items, if they appear in the sentence, must occur in the order 
shown. The following example shows how a noun phrase can be analyzed; 

NP DET ADJ N PP 

The large satellite in the sky— ^ The large satellite in the sky 
-where PP is a prepositional phrase. 

Thus, the parser examines the first word to see if it corresponds to its list of determiners (the, a, 
one. every, etc;); if the first word is found to be a determiner, the parser notes this and proceeds 
on to the next word, otherwise it checks to see if the first word is an adjective, and so forth. If a 
preposition is encountered in the sentence, the parser calls the preposltionaj phrase (PP) rule. 

A NP transition network is shown as the second diagram in Figure 1. where it starts in the 
initial state (4) and moves to:state (5) if it finds a determiner or an adjective, or on to state (6) 
when a noun is found. The loops for ADJ and PP indicate that more than one adjective or 
prepositional phrase can occur. Note that the PP rule can in turn call a NP rule, resulting ]n a 
nested structure; An example of an analyzed noun phrase is shown in Figures 2 and 3. 



•7 



IB 



N 




6RAMMAR 

S > NP VP 

fgp pr (DET) (ADJ*) N (PP*) 

PP — ► PREP ^^P 

VP ► VTRAN MP 

Figure LA Transition Net\sforkfor aSmaiiSubset of English. Each diagram represents a rule for 
finding the corresponding word pattern: Each rule can call on o ther nules to find needed patterns. 

After Graham {1979, p214.) 



-8 



17 



NP 



The payload on a tether under the shuttle 



DET N 



P.P 



The payload on a tether under the shuttle 
PREP MP 



on a tether under the shuttle 



DET N 



PP 



a tether under the shuttle 



PREP fvlP 



, under the, shattle 

DET N 
the shuttle 

Figure 2. rlxv nple Noun Phrase Decomposition, 



DET 




The 



• P^y'oad on a tether under the shuttle 

Figure J. Parse Tree Representation of the Noun Phrase Surface Structure. 



18 



ERIC 



As I lie iransiiion networks analyze a sentence, they can collect information about the word pat- 
terns they recognize and fill slots in a frarne associated with each pattern. Thus, they can identify 
noun phrases as singular or plural, whether the nduris refer to persons and if so their gender, etc; , 
needed to produce a deep structure; A sirnpie approach to collectirig this information is to attach 
subrbuiinos to be called for each transition. A transition network with such subroutines attached 
is called an **augmented transition network," or ATN. With ATN's, word patterns can be 
recognized. For each word pattern, we can fiil slots in a frame, the-resalting filled frames provide 
a basis for further processing; 

5: Other Parsers 

Other jDarsing approaches have been devised, but ATN's remain the most popular syntactic 
parsers, .ATN's are top-down parsers in that the parsing is directed by an anticipated sentence 
siructure. An alternative approach is bottom-up parsing, which examines the input words along^ 
the siring from left to right, building up all possible structures to the left of the current word as 
the parser advances. A bottom-up parser could thus buiid many partial sentence structures that 
are never used» but the diversity could be an advantage iri trying to interpret input word strings 
that are not clearly delineated sentences or contain ungrammatical constructions or unknown 
words. There have been recent attempts to combine the top-down with the bottom-up approach 
for NLP in a similar manner as has been done for Computer Vision (see, e.g., Gevarter, 1982). 

For a recFnt overview of parsing approaches see Sldcum (1981); 

1. Semantics, Parsing and Understanding 

The role of syntactic parsing is to construct a parse tree or similar structure of the sentence to 
indicate the grammatical use of the words and how they are related to each other. The role of 
semantic processing is to establish the meaning of the sentence. This requires facing up lb all the 
cantankerous ambiguities discussed earlier. 

In natural languages (unlike restricted languages, e.g., semantic grammars) it is often difficult 
to parse the sentences and hook phrases into the proper portion of the parse tree, without some 
knowledge of the meaning of the sentence. This is especially true when the discourse is ungram- 
matical. Therefore, it has been suggested that semantics be used to help guide the path of the syn- 
tactic parser (see, for example, Charniak, 1981). For that case, syntax presses ahead as far as it 
can and then hands off Its results to the semantic portion to disambiguate the possibilities. Woods 
(1980) has extended ATN grammars for this purpose. Barr and Feigenbaam (1981, p; 257) in- 
dicate that present language understanding systems are indeed tending toward the use of rhultiple 
sources of knowledge and are intermixing syntactics and semantics. 

Charniak (1981) indicates that there have been two rnaih lines of attack on word sense ambigui- 
ty. One is the use of discrimination nets (Reiger and Small, 1979) that utilize the syntactic parse 
tree (by observing the grammatical role that the word plays, such as taking a direct object, etc.) in 
helping to decide the word sense. The other approach is based on the frame/script idea (used, 
e.g., for story cdmpreherisidri) that provides a context and the expected sense of the word (see, 
e:g., Schanlc and Abeison, 1977). 



10- 



i9 



Another appr^ch is /'preference semantics" (Wilks. 1975) which i. a system of semantic' 
P nmutve. through which the best sense in context is determined: This system a 

^ arf'^ "^^^^^^ "-^'^^ ^^-^^^^^ 

m.t.es. actions, .ases. quahf.ers. and ,ypo ir^dicatdrs). Representation of a sentence is in terms of 
these prtrnutves which are arranged to relate agent... actions and objects. These have e e^tia 
rc^uons to e^h other^ Wilks' apprc.h Hr^d. the m.tch that best satisfies these ^d^^^ 

Cnarniafc ,nd.cates that the semantic, at the level of the word seme is not the end of t Sna 
Pn^cess out ..at is desireci is under.anding or comprehension (associated ^l l^Z^ 

stru.tL.re (see. e.g.. Schank and Riesbe.k. i9Sl) piays an important role. ■ 
J. NI.P Sysicnis . 
^ A. indicated beiow^^arious NLP sy.t... Have been developed for a variety of futtctions. 

^r Ques/lon A nswerirtg 3ysre^ 

reS'^^r Th'v^'""^^ '"""^^ "''^ '^''''^ most popular of the NLP 

m In H ' ''^^^ '""'^ """^'^ a data^base for a limited do- 

mam and that most of the user discourse is limited to questions. 

b. ^aiural Language Interfaces (NLI's) 

tic^s to aS^S' '"''"^ '° ' "''"'"^ means of communicating questions or instruct 

tions to a complex computer program. 

c. Cbmputer-Aided Instmctio^^ 
Arden (1980. p. 465) states; - ' 

BSiS'nuchl!!^^'!'" ''^ namrai languages is the int.faaidn n«ded ibr effective 

d . D is c o u rse 

Systems ,ha, are designed to understand disc5„rse (extended dialogue) usually etiploy 

spSr^dS™'" " ^'"'f' 

Text UntieTstanding 

o^S^^^ ^ andRiesbcCk, ,981) and others have addressed themselves to this 

problem, much more remams to be done. Techniques for understanding printed text include 
scripts and causative approaches. '<=u icai mciuae 



ArUcii (1980. pp. 465-466) states: 

To unUerstahcl. a text, a system heeds not only a knowledge of the structure of the 'ahguage but a body 
••world knowledge- about the domain discussed in _ the text. Thus a comprehensive; text-unden>unding 
systeiTvprcsuppQses an extensive feasoning system, one with a base of common-sense and domam-spccjtic 
knowledge:. ^. 

The problem of "understanding** a piece of text does, however, serve as a basic framework for current 
researcii in natural lariguages. Program^ are written which accept text input and illustrate their understanding 
of it by answering questions, giving paraphrases*, or simply providihg'a bibw-by-bldw accoani of the reason- 
ing that goes on during the analysis. Generally, the programs operate only o.n.a.small preselected set oi texts 
created or chosen by the author for exploring a small set of theoretical pirbbleiiis. 

/: Text Generation 

There are two major aspects of text generation, one is the determination of the content and 
te.xtual shape of the message, the second is transforming it into natural language. There are two 
approaches for accomplishing this. The first is indexing into canned text and combining it as_ 
appropriate. The second is generating the text from basic considerations. One need for t^xt 
generation resalts from the situation in which information sources need to be combined to form a 
new message. Unfortunately, simply adjoining sentences from different contexts usually pro- 
duces confasmg or misleading text. Another heed for text generation is for explanations of Expert 
System actions. Text generation will become particularly important as data bases gradually shift 
to true knowledge bases where complex output has to be presented linguistically. McDonald's 
thesis H980) provides one of the most sophisticated approaches to text generation. 

g: System Building Tools 

Recently, computer languages and programs especially designed to aid in building NLP systems 
have begun to appear. An example is OWL developed at MIT as a semantic network knowledge 
representation language for use in constructing natural langiTage question answering systems. 

2. Research NtP Systems : _ 

Until recently, virtually all of the ^4LP systems generated were of a research nature. These NLP 

systems basically were aimed at serving five functions: 
a. Interfaces to Computer Programs 
b: Data Base Retrieval 

c. Text Understanding 

d. Text Generation - "^ 

e. Machine Translation 

A few of the more promirierit systems are briefly reviewed in this se 
a. Interfaces to Computer Programs 

One of the most important early ^4LP systems, was a complete system cdmbihmg 
syntactic and semantic processing. This^y^^ as an interface to a research Blocks 
World simulation, is describedJn-Tt^Jle lia. 

SOPHIE (Table. nb>rarComputer-Aided Instruction (CAI) system, made use of a semantic 
erammar to^ar^eThe input and to provide instruction based on a simulation of a power supply 



TDi.JS (Table He) uses a procedural network (which encodes basic repair operations) to inter- 
pret a dialog with an apprentice engaged in repair of an electro-mechanical piirtip. 
b. Natural Language Ihterfaces to Large Data Bases 

One of the important and prominent research areas for NLP is intelligent front ends to data 
base retrieval systems. LUNAR (Table lid) is one of the most often cited early systems; It utilized 
a powerful ATN syntactic parser which passed dri its results to a semantic ar-ilyzer. 

PLANES (Table He) was a system designed as a front end to the Navy's database of mainten- 
ance and night records for all naval aircraft. This semaritic-grammar-based system ignores the 
sentence's syntax, searching instead for meaningful semantic constituents by using ATN subnets: 
These subnets -inciude PLANETYPE. TIME PERIOD. ACTION, etc. 

ROBOT (Table Ilf) uses ah ATN syntacJc parser followed by a semantic analyzer to produce a 
formal query language representation of the input sentence. ROBUT has proved to be very 
versatile. • 

LIFER/LADDER (Table Ilg) uses patterns or templates to interpret sentences. It employs a 
semantic (pragmatic) grammar, which greatly simplifies the interpretation. Can handle ellipses 
and prdhouhs. 

c. Text Understanding 

SAM (Table Ilh) is a research system that- aitempts to understand text about everyday events: 
knowledge is encoded in frames called scripts. SAM uses an English to Conceptual Dependency 
parser to produce ah internal representation of the story. 

PAM (Table iii) is one offspring of SAM. PAM understands stories by determining the goals 
that are to be achieved in the story. It then attempts to match actions of the story with methods 
that it knows will achieve the goals. 

d. Text Generation 

Winograd (1983) indicates that the difficult problems in generation are those concerned with 
meaning and context father than syntax. Thus, until recently, text generation has been mostly an 
outgrowth of portions of other NtP systems. 

e. Machine Transiation 

Though machine translation was the first attempt at NtP, early failures resulted in little 
foriher work being done in this area until recently. 

/. Current Research NLP Systems 
- Table 111 lists NtP Systems currently being researched. ' ' 

3. Commercial Systems 

The commercial systems available today together with their approximate prices are listed in 
Table IV. Several of these systems are derivatives of the research NLP systems previously dis- 
cussed. 



13 

S2 



TABLE ITa. Natural Language UndersYanding Systems, 



System/ lise 
SHRDLU ^ 

M I.T. 



1972) 



Nac t:ang: In- 
icri'acc to inun- 
ipulatc Blocks 
Woria 



Approach 

* Combines syntactic and se- 
mantic analysis with a body 
of world knowledge aboat a 
liinited domain to provide 

a NLI to deal with manipulating 
blocks in a Mmulaijpn of an 
artificial *'Bldcks World,'* 

• Starts the analysis of a 
usci 's sentence by syn- 
lacUcally parsing a 
meaningful portion of the 
sentence. Then sehiahtic 
routines are called to 
analyze the unit. The 

dc fin it tons of words in 
t_hc ciictjonary are in the 
form of proccdui'es (pro- 
cedural semantics) to an- 
alyze the unit. These j5ro- 
cedures set semantic markers 
oT possible relations to other 
words: If there are no semantic 
objcciions. the syntactic 
parser cdhlihiics, otherwise; it 
will try another f -.rse. 

► Facts are expressed in First 
Order Predicate I-pgic. Verifies 
hypotheses by theorem-proving: 

' Generates text by ' nil in the 
blank" and stored response 
pairerns: 

Ucufistically iiscs prbhduhs 
for noun phrases to reduce 
ihc sjijted riaiiire of the 
lexi re?>ponse. 

Type B System 



Capabilities 

One of the first systems to 
deal simultaneously with 
many sdphlsUcated issues 
of NLP; 
— parsing 
• — semantics 
— rcrcrenccs lb previous 

discourse 
—knowledge reprcseritatibn 
— problem solving 



tlmitations 

• Assumes it knows everything about 
the world. 

• Assumes world is logical, simple, 
small and closed. 

• R^eqiiired familiarizatibri by user 
to Qse it sDccessfally: 

• Was a prototype that prdved td be 
ndn-ppriable and non-extensible and 
IS nd Idhger in lise. 



ERIC 



Syslcm/Us* 

SOPHIE 
(Sophisricated 
ihstruciional 
Envirurimeht) 



(l)rowi« and 
Biiri bh. 1975). 

BHN 



C.A.I, in 

EIcG.lrbhic 

Trouble 

Shooting. 



TABLE m. Natural tunuuage Understanding Systems. 

^^"^ Capabllltie, 



• Iricbrpdrated a simuiation 
of a power supply circuit 
to WA student suggestions: 

* ^'"P'^y^ a semantic grammar 
using consiitutehts JiJce: 
flcqucsi, Fault. Instrument. 
Node/ Name, ami Junction/ 
Tyjic. 

• The semantic grammar worked 
much Mke a syntactic parser, 
but nodes in result ihg 

parse tree were meaningful 
semantic units. 

* top-down in 
a recursive fashion: 

' Ejch giammar rule was a LISP 
f^;^^^^'-"/<^_ t'i'it generated a 
sernantic represehialibri of a 
subtree in the parse. 

Type A + System. 



Limitalldns 



• Could run simulations, 
abstract theni and use 

- the results. 

• Responded in a few 

seconds: 

• Could skip words 
that did not niaich 
the grammar rule. 

• Very successful and robust. 



» Skipping words might change 
ntcahihg of sentence significantly. 

• 'Hie systc^hi organization rcMricts 
tlie system to Only tliis limited 
doniaih. ^ 



ERIC 



System/ Use 

I DUS n abk 
Oriented 

i)iai6gue System) 
SRI 

(Kobluson. 1980) 

Interactive Dia- 
log in context. 

Guide repair 
operation on 
electromechanical 
equipment. 



TABLE //c: Nuturaf titnguuge Understanding Systems. 
ApprbacU Capabilities 



Goal was lb follow the context 
as an apprentice moved from 
task to task and respond sue- 
cessfally to His remarks and 
requests for guidance. 

Various tasks to be per- 
formed were encoded in 
procedural networks— an 
extension of standard 
network formalisms to 
allow encoding of 
quantified information 
and information about 
processes. 

Uses procedural network 
to interpret dialog. 



Understands contexts, 
so ic can interpret 
rernarks such as 
"should,'* **done it.'* 
etc. 

Can follow particular 
ihsLahtiatidns of actions. 

Realizes the program 
does hot know all 
things. (Does not 
operate on *^lbsed 
world'* assumption). 

Uses procedural network 
system to Infer unstated 
Intermediate steps; 



l.iiiiilalibhs 



• Litjle uhderstahdihg bf the gbals 
and motivations of the apprentice. 



• Assumes that referential 
statements refer to objects 
salient In the current sub- 
task or higher in the task 
hierarchy. Uses context 

and discourse to identify objects 
refcri'ed lb by definite ribiiri 
plirases. 

• Type B + System. 



TA BLE na, Nmurui Langiioge Vnder standing Sysrems. 



S) SI em/Use 
(Woods. 1973) 



Natural l an- 
gu.igc Inter- 
face to Moori 
Rinrks batH 
Hnsc: 



Approach 

• .Sirnpjified Data Base 
— O II I y a SI nail voca bu la ry 
13500 words) required 
^^'^^ J"?pn rock data base, 
—LUNAR data base encoded 
in the data base query 
langaagc. 
—Seven data domains: Sets 
^^^^ ^^^'PV^^^ that could 
be members of each domain 
were mutually exclusive. 

* a pov^erfui AtN syntactic 
parser. 

* Parsed sentence sent on to 
the semantic program for 
translation iri^o a query. 
The resulting query was then 
executed. 

* analyzer gathers 
information from verbs arid 
^**^L'! "^^scs. nouns, noun mod- 
ifiers and determiners to build 
L^*^ ♦J^)^ base query. The qaery 

is built In terms of the cbricepiual 
primitives of the dala base: Uses 
rules to compare the synlaciic 
siruclurc of the question with a 
:'^^"^/*'^*^^'^ ^^J^Pl^^^^^ )f liiey match, 
the semantic part of the rule is 
nddcd to the developing query. 

rypc B - System. 



Cap^hilliles 

• Can liandle anjiphofic 
!'?_^^l*C"ces (pronoun 
references to previous 
phrases). 

• Could handie 90Va of the 

P°sed to LUNAR 
by geologists. 

» Overall formulation so 
^J^^" J^P'^.^cal that it 
has since been used for 
"^°A^ Parsing and lan- 
guage understanding 
systems. (Waltz. i98J, 
p 10): 



Lintifgfjons 



*• As ATN and scmaritic arialy^cr arc 
•^^Mraie, the scHKiniic analyrer 
must grope Ihrii parsed errors sucli 
?^ Prcposiiional phrases being 
attached at the wrong point in flie 
parse tree. 

• Utterances were limilcd to strict, 
data base iriquirics. 

• Based on a '^closed world** vicwpbiriL 



Prdved to be noh-porlable and ribri- 
cxtensible. No longer in use. 



^6 



ERIC 



TABLE lie. Nalural tan^uage Understanding Systems. 



System/tlse 

PLANtS/JETS 
(Prograinmed 
Language-based 
luiquiry Sys.) 



M.I:T: 

(Watu, D.L, 
1975) 

Na I lira I 
Language 
Inter face 
to a Large 
Data Base 



Approactt 



Data base is the Navy's 3-M 
reiationat data base which 
huids the maintenance and 
flight records for all naval 
aircraft. 

ignores syntax. Assumes that 
all inputs are in the form of 
requests that it turns Into 
fornia! language query expres- 
sions: 



Uses a sejnantic grammar. 
Lt looks for serriantic 
constituents by doing^a 
left to right scan of^t he 
user's sentence. Semantic 
constituents include 
items which belong to 
PLANETYPE. TiMEPERIOD, 
MALFUNCTION CODE. HOW 
MANY. ACTION, etc. 



Cspabllitics 

• Can handle ellipses and 
pronouns. 

• Can deal with some 
hongrarnmatical sen- 
tences. 

• Asks for a rephrase 
if it doesn't under- 
stand . 



Limitaiidns 

Relatively inefficient, could benefit 
frdrn a took ahead. A look ahead 
could result in an order of magnitude 
reduction in nurnbcr of arcs tested in 
the parse of a sentence. 

Problems with word sense selection 
and modifier at tachrnehL PLANES 
relics loo heavily on its particular 
world of discourse for eliminating 
problems of word sense selection. 

In a 1980 test, PLANES understood 
about 2/3 of queries correctly. 
Could be riiade into a Useful practical 
program with further work. 



• Uses an ATN parser. The 
top level calls various 
subnets to analyze the in- 
put for semantic cbhstit- 
uenis: 

• Utilizes concept case frames 
which are strings of constit- 
uchis of rcasbhablc queries: 

• After application of the con- 
cept case frames, the resulting 
semantic constituents ^rc passed 
along to the query generator. 

• Type A System: 



TABLE Iff, Uaturai Language Undersrandi?ig Sysfems, 



System/lJse 
ROBOT/INTEL- 



bartitlbuUi 
(Harris. 1977) 



Data msc 
Question 
Answering 
Sysicm, 



Approach 

• Uses an ATN syndetic parser 
^^^'^'L^i^^^Jracking) roliowcd 
by semantic analysis to pro^ 
^"^AArornial query language 
representation of the 
input sentence. 

• Handles a large vocaboiary 
by building an inverted file 
of data clement nanies indicating 
^^^^^A^'* in which each 
name occurs. In addition, the 
in^cfted nie contains 
words and phrases that are 

^"^^P/^^^c'J as data element 
names. 



Capabilities 

• INTELLner is one of the 
^^'l^LN.L. Data Base Query 
systems to be available 
cbrnmerciaiiy. 

• Can handle idioms 

via special mechanisms. 

' atlapt INTELLECT to 
a new data base in 
approximately one week; 

Can handle some prohbiiris 
and ellipses. 



timllalibh.s 



Does not consider context 
^^^^PL^o disambiguate 
pronouns and ellipses. 



• A dictionary of common 
English words is also 
included. 

• If two meanings of the 
inquiry appear likely, and 
^"'y Myrns lhat 
one is interpreted to be the 
appropriate one. 

• Type A System, 



28 



ERIC 



TABLE IJg. Natural Language Understanding Systems. 



LADDliK 

(i.anguagc 

Access lu 

bistribuicd 

Data with 

J-rror 

Recbvery). 

SKI 



(f{endrix el al., 
1978). 



Natural 
l.arigiiage 
Data Base 
Query. 



Approach 

• Application of LIFER parser. 

• Uses patterns or lempfntcs 
to interpret sentences. 
A.ssdciates a function with 
each pattern. 



• Uses a Semantic (pragmatic) 
^^P^^}^^^^"^ associated func- 
tions to inipiicitiy. encode 
knowledge about language and 
the worid. The grammar 
contains much ihrormatibh 
about the particalar data 
base being queried. 

• Type A System. 



Capabilities 

• Can correct spelling. 

• Can handle eilipsis. 

• Can interpret pronouns. 

• Can deal with large 
and complex data 
bases, e:g:, in Naval 
Ship DB has dealt 
with: 

— iOb fields in 14 files 
— records of 40.000 ships. 



• Can answer certain 
questions based upon 
its own N.L. proc- 
essing system: 

• Can be taught synonyms. 

• Can be taught new 
syntactic constructions. 

• Can accept a dcnhcU 
inpnt sentence as equiv- 
alent to a whole set 

of questions. 



Limilaliuhs 

' Cohvcrsaiibh is limited strictly ib 
questions aboat a small domain. 

' Can't deal with logically complex 
notions: 
— disjunction 
— quantification 
— implicaiibh 
— causality ^ 
— pbssibilily 

Clbsed-world viewpbiht 

Acts as if it wa^ dealing with 
a world 

— containing a fixed number bf 
objects and relationships 
between Ihem J 

— with objects and relationships 
being immutable. 



Of: 



iy(BtE m. Natural Language Vrulersfmidrng S}\ncms. 



SAM (Scfipi 

AtKily/cr 

NIcch:ini^ni) 

Vale 



(SManfc cc al., 
1975): 

Undtistanci.s 
even Is Usiiig 
protDlypc 
cJcscriptibhs 
of I hem. 



Approach 

* ^^'A^^'^*^'^^' proioiypicaj 
events is cuet)t]ed in liaiiR-s 
e;illctl scripts, 

• Utilizes a doinain dictionary: 
riJC Hi St word sense that .$al- 
i.slies the local context (as 
pfovrdcd by the script) is 
selected. (Tlius scripts are 
^J^°'l^_^n*?l^l^ for inter- 
preting words with multiple 
senses). 

Understands stories by fitting 
them to a script in a three part 

process: 



I. 



Parser genera! cs a concept ual 
^r^" " \^ ^) r epr esea t a - 
lion for each sentence. 
2. A script applicr (APPtY) 
gives it a set of verb-senses to 
use once a script is idcntiricd. 

IL^'^^^'^A see if the 
CD sentence representation 
niatehes the current script or 
any other script in the data 
^'^f^! ^^ niatchjng is 
suceessfol, APPLY makes a 
set of predictions about 
likely inputs to follow. Any 
steps in the current script 
that were left out in the story, 
are filled in. 
3. A hichibry module takes re- 
.suitant references to people, 
places, things, etc^ and 
fills in information about 
tlicfn: 

Type B System. 



C apahiliiics 

• Can produce a suiiijnuiy 
of the story {\n 
several difTerciii 
languages) pi answer 
qUcslibhs ab()ut it. 

' ?M Pio^_^Cf^ para- 
phrases of the story 
'"aJ^e inteiligent 
inferences from it. 

Can infer missing 
information by using 
the script. 



I.imil2iii(»ns 



• Knowicijge is piimarily ahoin evciy^ 
^^^.y .^''^o^'^' rather than ai)out natural 
language. 

• Only a sirigic object caii serve the 
role of a player or a prop. 



• Scripts foliow a linear sequence— 
can't deal with alternative 
possibilities. 

• Difficult to determine which 
scripts are appropriate for a 
given story: 



30 



ERIC 



TABLE lit Natural tavguage Vnderstahding ^ystenis. 



Syslcm/Use 

Yale 

(Wilcnsky» 1978) 



Story 

Understanding 



Approach 

' Understands stories by deter- 
mining the goals that arelo . . 
be achieved in tjie story. PAM 
then attempts to match actions 
of the story with methods that 
it knows will achieve goals. 

Has a kjiowledge base of platis 
and themes. 

A plan is a set of actions 

and subgbals for accomplishing 

the main goal: 

Theines are basic situations 
encountered in life, such as 
**lbve.'* 



Cipabiiilies 

Can summarize a story. 

Can answer questions 
about goals and actions 
of the characters. 

Can extend SAM to 
stereotyped situ- 
ations. 



imitations 

A great deal of Inference can be 
required by PAfvt to establish the 
goals and siibgoals of the story 
from the input text. 

Much mast be known about the nature 
of the story to be sure that the 
needed stored plans and themes are 
available. 



• Program starjs by cbhvcrling 
written text into CD repre- 
sentation (as in SAM). 

• Goals of an actor are 
determined in the following 
ways. 

—noting them explicitly in story. 
— u si ng pla ri s , est a blish i hg J h erri 

as sobgoais to a known goal, 
—inferring them from a theme 

noted in the story. 

• Type B-C System. 



31 



TABiJi m, CiirrVnt Research Nf P ^ysic^ 



nis. 



Purpose 



VVIH) 

(I nd User Friend jy 
iittcrfacc to Oala) 



NLI JO DHMS 



ASK 

(A Simple Knowledgeable 
Sysiein) 



To*" "scrs creating 
own data base 



Developer 



Coinmehts 



NI.I' 



Nl:i lo a DH 



Sy.stein Devclopmcht Corp. 
Santa Monica. 
California 



'"A* _ of Technology 
Pasajdena, 
California 



I3c!i !:;ibs 
Murray Hiij, 
New Jersey 



• Application Independent. 

• Uses an Intermediate Language as 

output of the NL analysis 
system: Then translate,- from ' 
this to the target DBMS query 
Inijguage; 



• Uses a limited dialect of English: 

* ^^^^*L>9P-^_ a Semantic Net with nodes 
limited to eiassc;s. Objects. 

and Relations, and the 
appropriate Corresponding arcs. 



• Consists of two parts, a Natural 
'•/^'Jl^ii^SC Processor (NI:P) and a 
Data Ba:sc Application Program 
(DDAP). 



■ The NLP is general purpose language 
^■"^^^^^^^^^Iclj a formal 

representation of the input. The 
^^"^^ J?„^'_KOriihni wijjcjj build:'; 
a query in an augnicnlcd relational 
algebra from the output of the NLP: 

portable and said to be 
very robust. 



32 



ERIC 



TABLE III Current Research NLP Systems, (vontinueti) 



System 



Purpose 



Devcldpcr 



Commenls 



IK NLi 

(Internal Representation 
-NlJ) 



NLI for an on-line 
information retrieval 
system. 



U. of Udine 
Udiiie, Italy 



Utilizes a base of expert knowledge, 
which concerns the evaluation of 
the user's requests, the management 
of the research interview, the selec- 
tion of search strategy and 1 11^^ 
schedoling of the lower leviL mpdules: 

Understanding and dialogue. 

REASONING and FORMALIZER. 



TEAM: 

( I'raiispprtable English 
Access Dala Manager) 



Transportable NLi 



SRI Inter. 
Menlo Park, 
California 



The UNDERSTANDING and 
DIALOGUE Module translates the user's 
requests into a basic formal internal 
representation. 



Has three major components: 

• An acquisition cbmpohem 

• The DIALOGIC Language System 

• Data-Access Component. 



ERIC 



Utilizes the acquisition component 
to obtain (via an interactive dialogue 
with the DB management persdhhcl) 
I he ihrbrmation required to adapt 
the system to a particular DB. 



33 



Translates English query into a DB 
query in two steps 

— The DIALOGIC system constructs 
a logical represchtatibn of 

the Query: 

— The data-access cdmpbncrii trans- 
lates the logic form into a 
formal DB query. 



NOMAU 



TABLE in. CurrerU Rcsearcfi NLP Sysr^nS^-fconmu^^^ 
**"n'"Ne Dcvtlupcr 



Tex I Undcrslandin^z 



( \ -J t f . ni :t : J A. n :i ' */ i ^ 
ol Oescrijit-vc Texts) 



TvM tlfjclcrstandihg 



Machiiic Translation 



( I inglish -Japanese MT) 



Mac hi Mi: Trans la I ion 



Ai rrojecl 

U. of California 

Jrvinc, 

California 



IJ; of Straihclyde 
C"j|HSgow, Scbrlahci 



y. of Manchester 
Eriglaricl 



Coiilineiils 



Kyoto U. 
Japnn 



• Uses injcrnnl syn lactic and sciiianiic 
e.xpeciations to understand nnedited 
naval ship-ib-shbre mesiages. 

• ^!^'f^^^ ^J?''!*^ 5*3ta base of domain 
specific knowledge. 

• Outputs a corrected well-fbrhied English^ 
trahslaiibn of the message. ^ 



Utilizes knowledge of^yntax. semantics, 
and pragniaiicsjit-an stages of the 
understanelifT^process to cope with 

errjot 




Instaiitlatcs domain dcj>cndci!t 
hieraicfii_cai_frame-likc s: met u res 

^^'^^'^l^" J" f'.'^Ot''^^^ identifying 
key words and using n domain 
dic(ibriary. 



* ^^l^'y^'^^ so'irce text and translates 
it into an intermediate (Iritcrlingua) 
^i*niH???-/r^?c" synthesizes tiirget 
language text from this. 

• Allows only a con I rolled vbcahu lary 
a ri d a rest ric i cd sy n tax , w j i j j t he 
aim of nilcroprbccssbr-base<l M T. 



' Uses Moritaguc Grammar to generate 
an iniermediate representation of 
meariinglul semantic relations hi 
a funetional logical form: Converts 

^ cbncepinal 
phrase stMielnrc form as:socia(ecI 
with Japaiiese. ^ 



34 



ERIC 



System 



LRC M r 



Machine Translation 



U. of Texas for Siemens 
Munich, W. Gerrhahy 



Employs a phrase-structure (PS) 
grammar augmented by lexical controls. 

U ti l[2es 9 yer ^X) PS r ulesjd^cr i bi hg 
the source language (GermaiTil and 
nearly 10,000 lexical entries 
each of two languages (German and 
the target lahguage^Ehglishi 




* Uses an all-paths, bottom-up parserV 



Uses special procedures to cope with 
ungranimatical input. 



(Not Named NLP 
System) 



NLI to ah ihferehcihg 
KB 



Hewlett Packard 
Palo Aito, 
Calirorhia 



KLAUS 

(Knowiedge-Learning and 
-Using Syiiem) 



Systems main cbmpbherits arej^ 

— A Geiieraiized Phrase Structure 
Gramniar 

— A lop-down parser-" 

— A logic transducer that outputs 

a first-order logical representation. 

— A **disambiguatbr" that uses sortal 
information to convert logical 
expressibhs into the query' language 
for HIRE (a relational data base): 



Computer acquisition of a SRI Interhational 
model of a domain of Mcnlo Park, California 

interest by being instructed in 
English. 



Uses SRrs DIALOGIC NLP System 
to translate English sentences into logical 
fepfesehtatidhs of their literal mcahihg in 
the context of the utterance. 



35 



KLAUS is a DARPA-sponsored long-term 
research project to Jevejbp lechriiques for 
facilitating the acquisition of knowledge by 
cbmputcr. 



Sysicin 



TABLE III Currenf nviearch Nf^^ (conUnued) 
Purpose lieveloper 



Com inch Is 



TLX! 



Text Generation 



U. of Pennsylvania 
Phila,, 

Pennsylvania 



rcxtJJndcrsianding 
arid Text Generation 



IBM C.S^ Uepi. 
Yorklown Hts:, 
New York 



• Schemas which encode aspects of 
discourse structure, are oscd to guide Uie 
discourse process. 

• ^J^^"sl^S_mechahisni monitors ihc 
use of the schemas, providing 
cbrjslraints on what can be said 

at any point: 

• On the basis of the iriput quest ioii, 
semaiitic processes produce a relcvatu 
knowledge pool. A partially ordered 
set of rhelorlcal techniques arc 
^^'^^l^'^ ^rr^^^^f^^fe for the pool. 
A message is generated by miuehiriK 
r*"^^P".^'y9"^^.!"_^^^ the 
associated rhetorical tcchriiqiies. 



• Utilizes an augrncntcJ phrase structure 
gramriiar. 

^ The core grammar consists at present 
of a set of 300 syrita.x rules. 

• Ambiguity is resolved by using r 
metric that ranks alternative paisc^: 



A "fitled-parse'* technique is used 
to produce rcasbhable approxiuutie 
parses to ungrainmatical inputs. 

Uses an on-line dicuohary witii about 
l.in,(KX) entries. 



36 



ERIC 



TABLE 111. Current Research NtP Sysrems. (conctuaed) 



System 



Purpose 



Ueveioper 



TOIMC 



Auiomatic Text 
Condensation 



U. br Cbrisiaricc 
Infer. Sci: Dcpi: 
West Germany 



KAMP 



NL C^eheratibn 



SRI InternationaL 
Mehld Park, Caiifbrhia 



» Uses frame-oriented knowledge 
represeritatibri mbdels. 

* ^^i^i?^'^^ 1**"J'^'"^*'**"& word experts" 
approach to aid in textual parsing: 

' Plans Nt utterances, starting with a high- 
level description of the speaker's goals. 

The heuristic plan generation process is by 
a NOAH'like hierarchical planner, and 
verified by a first order logic theorem 
p rover: 

X*^*^ P^^^I'^L^^^.s.^r.^ the dif- 

ferent subgoais to be achieved and 
linguistic rules abbut Engh'sH to produce 
utterances that satisfy multiple goals: 



on 



S>.strm 



lAIU /: i V- :^mne Comnwrdal Na/ural twi^ua-c Systems. 

Orniiili/ullfHi l»ilrpnsf 



INIM i.i ( I 

oj K()M()1) 

Cilso (iisif ib- 
niccl as ON- 
IJNli liNGLISI! and 
ORS Executive) 



W.'ilihnm. MnssacUu<;cns Reiiieval. 



fM ARl 

(IJased on SAM 
:iiul i'AM) 



(Cullianc) 

(Inforrniuidh Seiences) 



CogrnUvc Systems 
Ntw I laven. 
C'oiHicLliitii 



SI K \K;n r \ \\ k 

U.Hw ivafiv c 
of I.U 



Syniarurc 

Suutiyvjilc, 

(.aliioinia 



SAVVV 
S^^O/systerh 



SAVVV Marketing 
IiHciitaiionnl 
Snhhyvaie, 
CaJilontiii 



(Ojiier exicnsions 
rnutcrwayj. 



Custom NtiPs: 

1 Jtc first 'iysieiii — 
I'ApIoicr— is an iiticiface 
fo an cxistiM^ iiKiji ^oji- 
ciatiiig svMciii. OHici.s 
are inter fact's to daia 
bases. 



liigtily poifahlc NIJ 
conipiiter.s: 



('oninioius 



System Inteiraee 
fbi rMicro-conir>'.jters. 



• Seveial liiindrcii sysieiiis sVAA. 

• Takes about 2 weeks to iiiiplenKiu 
for a new data base. 



• Written In PL-i. 

• Avai!able for n:ainrrarne;s; 



♦ Large start-up cost \\\ bnilUilig 
the kitowledge bras'e. 

• ^^'^'•■L^V^it'iin ha\c ht-en, and 



• U'littcn in !;IS|* 



be \ery cohipact ami clMcfaK. 
Available about Nov. 198:^: 

• User enstonii/ed. 



• Not lingnislie. User ad;!|»tive <hes( 
/^'')_paiicni inatcltrrtg lo strinj:*^ of 
cjiaraeteis. 

• Released 3/R2: 

• User customized. 



ERIC 



TABLE IV. Some Commercial Natural Language Systems. fcohtVnuedy 



Sysiem 



Wcidncr Sysiem 
difection 



Or{{anizatiofi 



Porposc 



Comments 



Wcidilef CdmniUnications 
Corp. 

Prbvb, Utah 



Semi-Auiomatic 
Natural Language 
Trahslatidh. 



' tingaisiic approach: Wriiien hi 
FORTRAN IV. 

' Translation with.buman editing is 
approximately 1000 wbrds7hr (up to 
eight times as fast as liuman atone). 

Approx. 20 sold by end of i982. mainly 
to large hiulti-hatidhal cbrporatidhs. 



AI.PS 



ALPS 

Prbvb, Ulah 



interactive Natural 
Language Translation: 



Linguistic Approach. 

Uses a dictionary that provides the 
various trahslatibhs fbf technical 
words as a display to human translatpr, 
who then selects anibhg the displayed 
words. 



Nt:Mf!NO Texas Instruments, Inc: Ntl to Relational • Menu Driven Nfc Query System: 

Dallas. Texas Data Bases. 

• All queries constracied from menu 
fall within linguistic and c6hceptu~al 
coverage of the system. Therefore, 
all queries entered arc successful. 

• Grammars used are semantic grammars 
written iii a cbhtext>free grammar 
formalism. 

• Producing an interface to any arbitrary 
set of relatibhs is autornated and 

only requires a 15-30 minute interaction 
; wi;h someone knowledgeable about the 

rclatlDns in question: 



System will be available late in 1983 as a 
software package for a microcomputer. 



k. State of the Art 

r^'^f^lS''- " '""^r^'" '° '''' ^'ighly restncted cbn- 

^ h"nS;'^ r"""' '"''^'^ " tacUe n.anner is still f^r off, requiring understanding 
ot ^^n.rc people are com.ng from-their k:,o«l=;dge, goals and moods 

mar Jl^n^^^H - '^P"'" ^^^^"^ °" ^'^^d or pattern 

r.r^"on ATN 1^'"" ^^'^^^^ --'^'"S ^^-ms. both understanding and 

generation, ATN-like grammars can be cc^nsidsred the state of the 



text 



art. 



L. Problems and Issues 

i: Now People Use Language 



lany of the issues in natural language understanding center around the way people use 
— c^'^sl:" ~' ^^^-^'"^ °" 

a' l 1 'J ^ underlying motivation of a speech 

a.. IS a major .ssue Another ,ssue is understanding how humans process language-both in form- 
ing output and m interpreting input, uommiorm 
U also appears that knowledge-based inference is essentia! to natural language understanding 
as langMage just prov,des_abbreviated cues that must be flesh.d out Using models and expectations 
r s de m the-rece.ver. Fmally. we do not even have a good handle on what it means to t^S 
stand la- ^uage and what is the relation between language and perception. 

2. Linguistics 

vr^^3'^^''^^ '° ^"''^""'^^ ---"^^ ^° determine their ap= 

such LTh^- h"" ^-^'"P^-^-tary problem is dealing with novel language 

such as metaphors, idioms, similes and analogies.; 

att!ch'^!^?— 'r " ' ^^"^'^'^ J" natural language processing. Where to 

attach modifymg clauses is one problem. However even handling adverbial modifiers has proved 

no'tes"''''' Pragmatics::^thc study of language in conte^ct. Arden (1980, p. 474) 

cuNc. >^iin tne answer. I have id be ar a m#»iarinri h.r 1 1 »» i « — ' 



ramc svstem can include both ihe nrntr,rv.r„.c f^, .i 5" ■ L T ' "c protoiypes stored ip a 

=.audna situation In a DlannC. , J 'he domain being discussed and those related to the conver- 

rmm as.sun,ptlons about the re.evai.c of th^anlw^f to the question ^'^'^"^ 

theoretical ^.proach AsTeoS^^^^^^ d;^ 'l^ng w,th a subset of pragmatic problems, there is as y« no 

.i.el. that .^::ni b. U^'S^S^o -P'^n-'°n. however, it seems 



in the 1980's, 
3. Cohversatioh 

anlTtf^iJ^'hle'^— ""'"f °"' '"^^'^ ^-sely unkho^vn 

Stem" This ,s qu.te different from the closed world of many of the research NtP 

31 40 



ERIC 



-A major problem for NLP systems is following the diaiogne context and being able to ascer- 
tain the references of ribari phrases by takmg context into account." (Hendrix and Sacerdoti. 

1981. p. 330) . . - . r 

Another major problem is understanding the motivation of the participants m the discourse m 
order to penetrate their rttnari<s. As conversational natural-language communication between in- 
dividuals is dependent on what the participants know about each other's knowledge, beliefs, 
plans, and goals, methods for developing and incorporating this knowledge into a computer are 
xn^jOT issues. 

4. Processor Design 

'•While many specific problems are linguistic, . - . many important problems are actually 
general AI problems of representation and pmcess organization." (Axden, 1980, p. 409) 

A major issue in the design of a NLP system is choosing the tradeoffs be:ween capability, effi- 
ciency and simplicity^ Also at issue are the language constructs to be handled, generality, process- 
ing time and costs. The choice of the overall architecture of the system and the grammar to be 

used is a major design decision for which there are as yet no general criteria. ^ 

4^hough all--naturalJanguage processing 'he practical 



design of applications of grammar to NLP has proved difficult. The design of th-eTarsenn-both - 
theory and implementation is a complex problem. Also at issue is the top-down (ATN-hke) ap- 
proach to parsing versus botiom^up and combined approaches. In addition, how best to utilize 
knowledge sources (phonemic, lexical, syntactic, semantic, etc.) in designing a parser and a 

system architecture remains a major issue. ■ ' 

A problem with the ATN parser approach, with its heavy dependence on syntax, is how can it 
be adapted to handle ungrammaticai inputs. Though considerable progress has been made, there 
is as yet no clear solution. INTELLECT (a commercial ATN-based system) handles ungram- 
maticai constructions by relaxing syntactic constraints. IBM's Epistle System (Jensen and 
Heidbrn 1983) use- a fitting procedure to ungrammaticai Inputs to produce a reasonable approx- 
imate parse. Sen.antic grammars and expectation-driven systems have an advantage in overcom- 

ing ungrammaticai inputs. 

.Another major issue is: Is it appropriate to keep the semantic analysis separate from the syntac- 
tic analysis or should the two work interactively? (see Charniak, 1981) 

Also is it rieces^ary in NL translating or understanding to utilize an intermediate representa- 
tion or can the Final interpretation be gotten at more directly? If an intermediate representation 
is to'be used, which one is best? What is the appropriate role of riimitive concepts (such as found 
m case systems or conceptual dependency) in natural language processing? . 

How can we make restricted natural language more palatable to humans? A m; jor problem is 
the negative expectations created in the Sind of a naive user, when a system doesn't unde.'Stand 
ar input sentence: Naive users have difficulty distinguishing between the limitations in a sv-stem^s 
conceptual coverage -nd the system's linguistic coverage. A related problem is the system return- 
ing a null ver: 1 his may mislead the usei as an answer may be null for many reasons. Anotner 
problem is insur'.ig a sufficiently rapid response to user inputs. 



32 



One common problem, with real systems is siohewaJling behavior— the system not responding 
to what the user js really after (the user's goal) becanse the user hasn't suitably worded the input. 
Some of the important problems and issues have to do with knowledge representatidh: 
— \N'hich knowledge representation is appropriate for a given problem? 
—How to represent such things as space, time, events, human behavior,- emotions, physical 

mechanisms and many processes associated with novel language? 
—How can common sense and plausibility judgemem (is that meaning possible?) be 

represented? 

— How should items in memory be indexed and accessed? 

—How should context be represented? 

— How should memory be updated? 

—How to deal with iiicbrisistehcies? 

— How can we make the representations more precise? 

—How can we make the system learn from experience so as to build up the necessary large 
knowledge -needed^tcMdeai-with-the-rcaj-wo^ 



—How can we build useful internal represeritatidhs that correspond to 3D models, from infor- 
matioh provided by natural language? 

NLP usually takes the sentence as the basic Unit to be analyzed. Assigning purpose and mean- 
ing to jarger.units has iDroved difficult. The NRL Conceptual Linguistics Workshop (1981) cori- 
clud^d^that ''Concept extraction was the most difficult task examined at the workshop. Success 
depends on the adequacy of the situation-context representation and the development of more 
sophisticated models of language use." 

NtP has always pushed the limits of cdmputer capability. Thus a current problem is designing 
special computer architectures and processors for NLP. 

5. Data Base rnterfaces 

Hendrix and Sacerdoti (1981, pp 318, 350) fDoint out^tv^o problems particularly associated with 
data base interfaces: 

L^I- T^ie need to understand comext throws considerable doubt on the idea of building jiatural-language in- 
terfaces 10 systems with knowledge bases independent of the language processing system itself. 

f2), One of the practical problcn^^^ the use of NLP.systems for accessing. data bases is the 

lack of trained people and good support tools for creating the knowledge structures needed for each new data 
base. 

6. Text Understanding 

Text understanding systems have encountered problems in achieving practicality, both in terms 
of extending the knowledge of the language and in providing a sufficiently broad base of world 
knowledge. The NRL Conceptual Linguistics Workshop (1981) concluded that ''eurrent systems 
for extractihg information from military messages use the key word and key phrase methods 
which are incapable of providing adequate semantic r'^presentatibn. In the immediate future, 
more general methods for concept extraction probably will .work well only in well defined sub-, 
fields that are carefully selected and painstakingly modeled.** 



33 



Ski arid the National Library of Medicine have text dnderstahdihg systerhs in the research 
stage; SRI hahdcodes logic fdrrhulas that describe the content of a paragraph. Queries are 
matched against these paragraph descriptions; 

M. Research Required 

Current research in natural language processing systems includes machine translation; informa- 
tion retrieval and interactive interfaces to computer systems. Important supporting research 
topics are language and text analysis, user modeling, domain modeling, task modeling; discourse 
modeling, reasoning and knowledge representation. 

. Ivluch of the research required (as well as the research now underway) is centered around ad- 
dressing the problems and issues discussed in the following areas: 

/: Jiow Peopie Use Language 

The psychological mechanisms underlying human language production is a fertile field for in- 
vestigation; Efforts are needed to build explicit cdmputatibhai models to help explain why human 
languages are the way they are and the role they play in human perception. 

2. Linguistics 

Farther research is needed on methods for resolving ambiguities in language and for the utiliza- 
tion of context in language undferstanding; 

i. Conversation ' 

Additional work is needed oh ways to represent the huge amount of knowledge needed for 
Natural Language Understanding (NttJ). 

A great deal of research is heeded to give NLU systems the ability to understand not only what 
is actually said; but the underlying intention as well. 

Research is how underway by many groups on explicitly modeling goals, intentions and plan- 
ning abilities of people. Investigation of script and frame-based systems is currently the most ac- 
^tive NLP AI research area. 

4. P r o c ess o r D es ig n 

Architectures, grammars, parsing techniques and internal representations needed for NtP 
systems remain important research areas. 

One particularly fertile area is how to best utilize semantics to guide the path of the syntactic 
parser. Charhiak (1981, p 1085) indicates that a relatively unexplored area requiring research is 
the interaction between the processes of language comprehehsioh and the form of semantic 
represeritaiibh used. 

Further work is needed on bringing multiple knowledge sources (KS's: syntactic; semanp<r; 
pragmatic and contextual) to bear on understanding a natural language utterance, but sUiPxeep- 
ing the KS's separate for easy updating and modification. Also needed is fariher^^wk in AI 



probiem-solving to cope with the pf oblem of finding an appropriate structure in the hiige space-of 
possible meanings of a natural language input. y''^ 

Improved NLU techniques are needed to handie complex ndtidns sUcn as disjuiictiori, qiiah- 
tificatidri, implication, causality and possibility. Also needed are better methods^ for handling 
**open worlds, where ail things needed to understand the world are/ilot iri the system's 
knowledge base. 

Further research is also necessary to aid with a common source of-lrouble in NLP, that is, deal- 
ing with syntactic and semantic ambiguities and how to handle metaphors and idioms. 

Finally,, the problems of efficiency, speed, portability,.efc^, discussed in the previous chapter, 
aJl are in need of better sdlutidris. 

5. Data Base Interfaces y^'^ 

A current research topic is how cantata base scfiemas best be enriched to support a natural 
language interface, and what wduld'be the best Idgical structure for a particular data base. 

Research is also needed on rnore efficient methods for compiling a vocabulary for a particular 
application. 



6. Text Underst^/t^ing 

Seeking general methods of concept extraction remains as one of the major research areas in 
text und^Standing. 

^^^t^^^frin^cipa! U.S. Participants in NLP 

/: Research and Devetopment* 
Non-Profit 

SRI . ' 

MITRE 

Universities 

Yale U. — Dept of Computer Science 
- U: of CA, Berkeley — Computer Science Div., Dept of EECS. 
Carnegie-Mellon U. — Dept of Computer Science: 
U: of Illinois, Urbana — Coordinated Science Lab. 
Brovvh U. ~ Dept of Computer Science 
Stanford tJ: — Computer Science Dept^: 
U. of Rochester — Cdmputer Science Dept. 

U. of Mass, Amherst — Department of Computer and Information Science 
SUNY, Stoneybrodk — Dept df Cdrtipiiter Science 
U. of CA, Irvine — Computer Science Dept, 



•A review of cuffcnt research in ^^LP is given iri Kaplari (1982). 



35. 

o 

ERIC 



U bi* PA — Dept of Cdmputer and Infor. Science 

GA institute of Techriolbgy — Schbbl of Ihfbr. and Cbrriputer Science 

use — Infdf . Science Institute. 

MIT — AI Lab. 

NYU — Computef Science Dept. and Linguistic String Project 

U. of Texas at Austin — Dept of Computer Science 

Cal. Inst, of Tech. ~" 

Brigham Young U. — Linguistics Dept. 

Duke U. — Dept of Computer^ Science 

N Carolina State — Dept. of Computer Science 

Oregon State U. — Dept of Computer Science 

Industrial 
BBN 

TRW Defense Systems 

IBM, York town Heights, N.Y. 

Burroughs 

Sperry Univac 

Systems Development Corp, Santa Monica ' 
Hewlett Packard 
Martin Marietta, Denver 
Texas Instruments, Dallas 
Xerox PARC 

Bell Labs ■'_ _ 

Institute for Scientific Iriformatibri, Phila., PA 
GM Research^ Labs, Warren, MI 
Hbrieywell 



2. Principal U:S. Gdvernment Agencies Fundin 
ONR (Office of Naval Research) 

NSF (National Science Foundation) 

DARPA (Defense Advanced Research Projects Agency) 

3. Commercial NLP Systerhs 

Artificial Intelligence Corp., Waltham, Mass. 
Cognitive Systems Inc., New Haven, Conn. 
Symantec, Sannyvale, CA. 
Texas Instruments, Dallas, TX. 



ALPS, Provo, UT. 



Weidner Communications, Inc., Prbvo, Utah 
SAVVY Marketing Inter., San Mateo, CA. 




4. SonrV.S: 

U. of Manchester, England 
Kydid U., Japan 
Siemens Corp. Germany 
U o\ Siraihclyde, Scoiiand 

eenire National de la Recherche Sciemifique, Paris . 
/ U. di Udine, Italy : ' 

b': of Cambridge, England . . ' 

Phiiips Res. Labs, The Nethertands 

O. Forecast , " 

Commercial natural language iriterfaces (NXI's) to computer programs and data base manage- 
mem systems^are now becoming available: The imminent advent of NLI's for micro-computers is 
the precursor for eventually making it possilSJe for virtually anyone to have direct access to 
powerful computational systems. ^^--^^^^ 

As the cost of computing has^ontinued to falU'^uTthe cost of prograrSming hasn't, it has 
already become^ cheaper^rr^^ applications to create NLI systems (that utilize subsets of 
hnglish)jhan to tran-'people in formal programming larig 

pDmputauonaf and workers in related fields are devofirig considerable attention to the 

problepis-bf NLP systems that understand the goals and beliefs of the individual communicatdrs. 

r^gh progress has been made, and feasibility has been demdristrated, more than a decade^wlli 
be required before useful systems with these capabilities will become available. 

One of the problems in implementing new installations of NLP systems is gathering informa- 
tion about the applicable vocabulary and the Idgical structure of the associated data bases. Work 
is now underway to develop tools to help automate this task. Such tools should be available 
within 5 years. 

For text understanding, experimental programs have been developed that *'skim" stylized text 
such as short disaster stories in newspapers (DeJdrig, 1982). Despite the practical problems of suf- 
ficient^world knowledge and the extension of language knowledge required, practical tddls emerg- 
ing from these efforts should be available to provide assistance to humans doing text understand- 
ing^within this decade. 

' The NRL Computational Linguistic Workshop (1981) cdricluded that text generation tech- 
niques are maturing rapidly and hew application possibilities will appear within the next five 
years. 

The NRL workshc Msd indicated that: ^ 

Machine aids for human^^^^^^^ appear to have a brighter prospect for. immediate applicatidh rhari fully- ^ ' 
auiomatic translation; however, the Canadian French-English weather bulletin project is a fully^-autoSatic 
system in which only 20'^o of the translated sentences- require minor rewording before public release. An ain- 
bJMous common market projecr involving machine irarislation among six European iangaugcs is .scheduled to 
becm shdrtly; Sixty pebble will be involved in that undertaking which wUl be one^f the largest projects under- 
taken m computaiiona! linguistics:* The panel was divided in ii>fofecast on the five year perspective of 
machine translation but the majority were very optimistic.:^-' 



•HUROTA— michine translation project sponjoied by the European Cdmmdh Market— 8 countries, over 15 univer- 
sities. S24 M over several years. ^ -"'^ 



37 



ERIC 



Nippon telegram and Telephone Gorp ih Tokyo has a machine trahslatibh AI project under- 
way. Ah experimental system for transiating from Japanesetq English and vice versa is now being 
demonstrated: In addition, the recently initiated Japanese Fifth Geherati^bh Computer effort has. 
computer-base'd'^natirr^^^^^ as one of its major goals; 

In summary, natural language interfaces using a lirhited subset of English are tidw becdmiilg 
available. Hundreds of specialized systems are already in operation. Major efforts in text 
understanding and machine translation are-underway, and useful (though limited) systems will be 
available within the next five years. Systems^ that >re heavily knowledge-based and handle more 
complete sets of Ehglish should be available within this decade. However, systems that can handle 
unrestricted natural discourse and understand the motivation of the cbmmunicators remain a dis- 
tant goat, probably requiring more than a decade before useful systems appear. 

As natural language interfaces coupled to intelligent computer j5rograms become widespread, 
major changes in our society are likely to result. There is a trend now to replace relatively un- 
skilled white collar and factory work with trained computer personnel operating computer-based 
systems. However, with the advent of friendly interfaces (and eventually even speech understand- 
ing systems and automatic text generatioti from speech) relatively unskilled personnel will be able 
to control complex machines, operations, and computei programs. As this occurs, even relatively 
skilled factory and white collar work may be taken over by these lesser skilled personnel with their 
computer aids— the experts and computer personnel moving on to develop new programs and ap- 
plications. '_ _ _ _ _ 

The outcome of such a revolution cannot be fully predicted at this time, other than to suggest 
that much of the power of the computer age will become available to everyone, requiring a 
rethinking of our national goals and life styles. 

P. Further Sources of information 

/. lournaVs 

• American Journai of ComputaUohal Lmgv^^^^^ by the major society in NLP, 
the Association for Computational Linguistics (ACL). 

• S/G/4/?rNew^/e//er— ACM (Assbciatibh for Cdm^ 

• ArUJTciaJ rnleWgence 

• /I / Magazme--- American Association for Al (AAAI^ 

• Pattern A nalysis and AfacHihe Tmelligence—IEEE 

• Tnternationui Jvurnai of Man Machine Interactions 

2. Conferences 

• Cbmputatibhal Linguistics (COLING)— held biannually. Next one is in July 1984 at Stan- 
ford University. 

• International Joint Conference on AI (I JCAI)— biannual. Current bne in Germany. August 
1983. • ' . ' ^ 

• ACL Annual Conference. 




• A-^AI annual conferences: 

• ACM conferences. ; 
* * IEEE Systems, Man & Gybernetics Annual Conferences. 

• Conference on Applied Natural Language Processing: Sponsored jointly by ACL & 
NRL'-I eb. 1983 in Santa Monica, CA. 

3. Recent Books ^ 

• \V inograd, T:, tunguage as a Cognitive Process, Vol h Syntax, Reading, Mass: Addison 
Wesley, 1983. 

• Lehnert, W:6: and Ringle, M.H. (eds.). Strategies for Natural tan^^^^^^ Processing, 
Hillsdale, N.J. Lawrence Eribaum, 1982; 

Sager, N., Natural Language Information Processing, Reading, Mass: Addison- Wesley 
1981 

• '^^nnani, H,, Natural Language Processi^^^ 

• Brady, M., Computational Approaches to ni^^^^^ Caimbridge, Mass: MIT Press, 1982. 

• Joshi, A.K., Weber, B.L. and Sag, LA. (eds), Elements of Discourse Understanding, Cam- 
bridee: Cambridge University Press, 1981. 

• L. Bole (ed.). Natural language Commumcation mth Computers, BGrlin: 'SprrngtsKVerlae 

198]: 

• L. Bole (ed.). Data Base Qu^tion Answenng Systems, Berlin.:,:Sprmger-Verlag, 1982. 

• Schank, R.C. and Riesbeck, C.K., Inside Compujp^&nde^^^^ Hillsdale. N.J.: 
Lawrence Eribaum, 1981. 



4. OverViews and Surveys 

• Barr- A and Feigenbau.ni,>E:A., Cha ^^Understanding Natural Language." The 
Handbook of ArpfiSlal Intelligence, Vol I, Los Altos, CA: W. Kaufmann 1981 pp 
223-322. 

• S.J.J«CapIan, ^'Special Section— Natural Language," SIC APT Newsletter, No, 79 Jari 
v,;>'r$82, pp 27-109. .-^^ 

• Charniak, E., **SJx Topics in Search of A Parser: An iSverview of AI Language Research," 

7-57, pp<D79- 1087: 

• \Vaitz„D:l., V*The State of the Art in Natural Language Understanding." In Strategies for 
Natural Language Processing, Lehnert md M.H. Ririgle (eds). Hillsdale. N.J.: 

. Lawrence Eribaum, 1982, pp. 3-32. 

•^^idcum, J. , * 'A Practical Com^^ of Parsing Strategies for Machine Translation and/ 
Other Natural Language Processing Purposes," Tech,, Report NL-41, Dept 6f e:S:, U: of 
Te.xas, Aug 1981. ' 

• Hendrix. G. G. and Sacerdoti. E.D., *'Natura!-Lariguage Processing: The Field in Perspec- 
tive," Byte, Sept. 1981, pp 304-352. 



39 



REFERENCES 



*. Ardeh, B.W. (ed). What Can Be Automaretlf (CO^ eambridge. Mass: MIT Press, 
1980 ...^.-•^''''^ 

• Barr, A.and FeigenbaumrE'TA., Chapter 4, **Understanding NKiurai tangudigc,'' The Hand-' 
book of Artifjci^Jntefli^^^ Los Altos. CA: W, Kaufman, 1981, pp 223-321. 

• BarrgvA^itTG.,** Artificial Intelligence; State of the Art," Technical Note 198, Menlo Park, 
„^A:'^I International, Oct. 1979, 

• Brown, J.S. and Burton, R.R., **Multiple Representations of Knowledge for Tutorial 
Reasoning: " in Representation of Learning, D.G. Bobrdw and A. Collins (Eds.), New York; 
Academic Press, 1975. _ 

• Burton, R.R., ^'Semantic Grarnmar-r^Art Engineering Technique for Constructing Natural 
Language Understanding Systems^ BBN Report 3453, BBN, Cambridge, Dec. 1976. 

• Charniak, E^, ;*Six;Topics in Search of a Parser: An Overview of ALLanguage Research," 
lJCAl-81, pp 1079-1087 ^ " ' 

• Charniak, E. and Wilks, Y., Cowputatiofiai Sernantics, Amsterdam: North Holland, 1976. 

• Chomsky, N/, Syntactic Structures, The Hague: Moutori, 1957. 

• DeJong, G., **An Overvievir of the FRUMP: System:" In Strategies for Natural Language 
Processing, W.G. Lehiterrand M.H. Ringle^(eds), Hillsdale,. N.J. : Lawrence Erlbaum, 1982, pp 
149-176. - ' : . . 

• Fillmore, C., /'Some Problems for Case Grammar" In R.J. O'Bfieri (Ed.j, Report of the 
Twenty'Second Annual Round Table Meeting oh Linguistics. and Language Studies, "Wash., 
D.C.: Georgetown U. Press. 1971, pp. 35-56. 

• Firiin, T., W., **The Semantic Interpretation of Compormd Nominals," Ph.D. Thesis, U, of 
lb, brbana, 1980; ^ 

• Gawrbri, J.M. et al.," **Prdcessihg English with a Generalized Phrase Structure Gramma^^ 
Proc, of the 20rh Meeting of ACL, U: of Toronto, Canada, 16-18 June 1982, pp: 

• Gazdar, G., **Uhbourided Dependencies and Coordinate Structure, "J,i/^i7?5//c/Aj^w/r>', 12, 
1981, pp. 155-184. 

- • Gevarter, W.B., An Overview of Computer Plsior,^J>fB^R 82-2582, National Bureau of 
Standards, Wash., D.C., September 1982. 

• Gevarter, W.B., An bverview of ArtiJjj^ktn^niWVgen^^ and Robotics^^l 1, NBS (in ] 
1983. 

• Graham, N.,^myTcza/>&?^^ Blue Ridge Summit, PA: TAB Books, 1979. 

• Heridrix, G. Q^^d^cerdoti,.E.D., * 'Natural-Language Processing: The Field in Perspec- 
tive/' B^^/e^J^rf^Sl, pp: 3(^^ 

'^-^drix, G.G., Sacerdbti, E.D., Sagalowicz, D., and Slbcum, J., ''Developing a Natural 
.arfguage interface to Complex Data," ACM Transactions on^Datcbase Systems, Vol. 3, No; 2, 
June 1978. 



Preeeding page &lank 



49 



ERIC 



• Jensen. K. and Heidbrn. G.E., **f he Fitted Parse: imWo Parsing Capability in a Syntactic 
Ciranimar of English," Cohf: on Applied NtP, Santa Monica. CA^Feb^ 1983. pp: 93-98. 

. • Kaplan. S.J.. (Ed.). **Special Sectibn-^Natural Language." STGART NEWSLETTER #79. 
Jan. 1982. pp. 27-109.' 

• McDonald. D.B.. ''Understanding: Noun Gdrripdunds." Ph.D. Thesis. Carriegie-Melldh U.. 
Pittsburgh. 1982. 

• McDonald. b.B.. '^Natural Language Prdductidn as a Prdcess df Decisiori-Makirig Under 
Constraints." Ph;D. Thesis. M.LT.. Cambridge. 1980. "* 

• Nishida. T. and Doshita. S.. "An Applicadon df Mdntague Grarrirriar td English-Japanese 
Machine Translation." Proc. af Conf. on Applied NLP, Santa fVtohica, Feb. 1983. 

• Reiger; €. and Small. S.. /'Word Expert Parsing." Proc. of rfie Sixth Tnternauonai Joint 
Conference on Artificial rnteJligence, 1979. pp. 723-728. 

• Robinson. A. E .et al;. "Interpreting Natural Language Utterances in Dialog about Tasks." 
Al Center TN 210. SRI Inter.. Menld Park. CA. 1980. 

• Schank. R.C. and Abeison. R:P:. Scripts, Plans, Goafs and Understanding, Hillsdale. N.J.: 
Lawrence Erlbaum. 1977.. 

• Schank. R.C. and Riesbeck. C.K.. Inside Computer Understanding, Hillsdale. N.J.: 
Lawrence Erlbaum. 1981. 

• Schank. R.C. and Yale Al Project. "SAM— A Story Undersiainder." Research Repi 43. 
Dept of Comp Sci. Yale U.. 1975. 

• Slocum. J.. "A Practical Comparison of Parsing Strategies for Machine Translation and 
dthej^ Natural Language Purposes." Ph.D. Thesis. U. df Texas. Austin. 1981. 

• Terihaht. H,./v^ff/wrfl/Z,ffngwflge Prdces5/ng, New York: PetrocelH. 1981. 

• Waltz, D:L. . "Natural Language Access to a Lar^e Data Base." In Advance Papers of the 
Tnternational Joint Conferen^^^^ on ArttficiaJ rnteTligence, Cambridge* Mass. MIT. 1975. 

• Waltz^D.L.."The State of the Art in Natural Language Understanding." In. Strategies for 
Narurat Language P^^ W.G. Lehnert and M.H. Ringle (eds), Hillsdale. N.J.; Lawrence 
Erlbaum. 1982, pp.-J-32. 

• Webber, B.L. aii^d Finin, T.W., "Tutorial on Natural bangaage interfaces. Pan 1— Basic 
Theory and Practice, •\'AAAI-82 Cdnference, Pittsburgh, PA, Aug. 17, 1982. 

• Wilks, Y., *^A Prdi|erehtial Pattern-Seeking Semantics for Natural Language Processing," 
Artificial intelligence] Vol. ^^^^ 

• Wihdgrad, T., Understand^^^ New York: Academic Press, 1972. 

• Winograd, T., Language as a Cognitive Process, Voi I: Syntax, Reading, Mass: Addisdn- 
Wesley, 1983. 

• Woods, W.A., "Progress in Natu'ral Language Understanding— An Application to Lunar 
Geology," In Proc, of the National Computer C^^ Montvale, N.J.: AFIPS Press, 1973. 

• Woods, W. A., "Cascaded ATN Grammars," Amer. J. of Computational Linguistics, Vol. 
6; No. 1, 1980, pp. M2. 

• "Applied Computational Linguistics in PersiJective,'' NRb Workshop at Stanford Univer- 
sity, 26-27 June 1981. (Proceedings in A merican Journal of Computational Vdl. 8. 
Nd. 2. April-June 1982, pp 55-83.) 



42 



GLOSSARY 



A7mphora: The fepetitibri of a word or phrase at the beginning successive statements, questions, 
etc. 

C.A.t: Computef- Aided Instruction 

Case: A semantically relevant syntactic relationship. 

Case Frame: An ordered set of cases For each verb form. 

Case Grammar: A form of Transformational Grammar in which the deep' structure is based on 

cases: ' 

ConYpuiatmdl Llrvgulstics: The study of processing language with a computer. 

Conceptud Dependency (CD): An approach, related to case frames, in which sentences are 

translated into basic concepts expressed in a small set of semantic primitives. 
DB: Data Base 
^J^MS: Data Base Management System 
Deep S(rucrure:thQ underlying formal canonical syntactic structure, associated with a sentence, 
■ that indicates the sense o verbs and includes subjects and objects that may be.implied 
but are missing from the original sentence. 
Discourse: Cdnversation, or exchange of ideas. 
Domain: Subject area of the communication. 

Frame: A data structure for grouping information on a whole situation, complex object, or series 
of events. - - 

Grammar: A scheme for specifying the sentences allowed in a language, indicating the syntactic 

rules for combining words into weii-formed phrases and clauses. 
Heuristic: Rule of thumb or empirical knowledge used to help guide a solution. 
KB: Knowledge Base 

Lexicon: A vocabulary or list of words relating to a particular subject or activity. 

Ltnguisncs: The scientific study of language. ^ . 

Morphology: the arrangement and iniefrelatioriship of morphemes in words. 

Morpheme: The smallest meahirigful unit of a language, whether a word, base or affix. 

l^Work Representation: A data structure consisting of nodes" and labeled connecting arcs. 

A'i: Natural Language 

NLI: Natural Language interface 

NtP: Natural Language Processing 

NLU: Natural Language Understanding 

Parse Treer A tr^e-like data structure of a sentence, resulting from syntactic analysis, that shows 

the grarrimatical relationships of_the words in the sentence. 
Parsing: Processing an input sentence to produce a more useful representation. 
Phonemes: The fUndarnehtal speech sounds of a language; 



43 



51 



Phrase Stnicture Grammar: Also referred to as Context Free Grammar. Type 2 of a series of 
grammars defined by Chomsky: A relatively natural grammar, it has been one of the most 
useful in natural-language processing. 

Pragmatics: Jhe study of the use of language in context: 

Script: A frame-like data structure for represeritirig stereotyped sequences of events to aid 

in understanding simple stories. 
Sernantic Grammar: A grammar for a limited domain" that,' instead of using cdnventidnal 

syntactic constituents such as noun phrases, uses meaningful components appropriate to the 

dbmairi. 

Semantics: The study of meaning. 
Sense: Meaning. . 

Surface Structure: A parse tree obtained by applying syntactic analysis to a sentence. 
Syntax: The study of arranging words in phrases and sentences. 

Template: A prototype model or structure that can be used for sentence interpretation. 
Tense: A form of a verb that relates it to time. 

Trahsformationai Grammar: A phrase structure grarnmar that ihcbrpbrates trahsformatibhai 
rules to obtain the deep structure from the surface structure. 



-44 



